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Abstract. We investigate the ability of a genetic algorithm to design cellular au- 
QO ! tomata that perform computations. The computational strategies of the resulting 

cellular automata can be understood using a framework in which "particles" embed- 
ded in space-time configurations carry information and interactions between particles 
effect information processing. This structural analysis can also be used to explain the 
evolutionary process by which the strategies were designed by the genetic algorithm. 
• More generally, our goals are to understand how machine-learning processes can design 

complex decentralized systems with sophisticated collective computational abilities and 
to develop rigorous frameworks for understanding how the resulting dynamical systems 
perform computation. 
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1. Introduction 

From the earliest days of computer science, researchers have been interested in making 
computers and computation more like information-processing systems in nature. In the 
1940's and 1950's, von Neumann viewed the new field of "automata theory" as closely related 
to theoretical biology, and asked questions such as "How are computers and brains alike?" 
[75] and "What is necessary for an automaton to reproduce itself?" [76] . Turing was deeply 
interested in the mechanical roots of human-like intelligence [72], and Weiner looked for 
links among the functioning of computers, nervous systems, and societies [79] . More recently 
work on biologically and sociologically inspired computation has received renewed interest; 
researchers are borrowing information-processing mechanisms found in natural systems such 
as brains [7, 28, 63], immune systems [25, 31], insect colonies [9, 21], economies [78, 80], and 
biological evolution [2, 29, 40]. The motivation behind such work is both to understand how 
systems in nature adaptively process information and to construct fast, robust, adaptive 
computational systems that can learn on their own and perform well in many environments. 

Although there are some commonalities, natural systems differ considerably from tradi- 
tional von Neumann-style architectures 1 . Biological systems such as brains, immune systems, 

*It should be noted that although computer architectures with central control, random access memory, 
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and insect societies consist of myriad relatively homogeneous components that are extended 
in space and operate in parallel with no central control and with only limited communication 
among components. Information processing in such systems arises from coordination among 
large-scale patterns that are distributed across components (e.g., distributed activations of 
neurons or activities of antibodies). Such decentralized systems, being highly nonlinear, of- 
ten exhibit complicated, difficult-to-analyze, and unpredictable behavior. The result is that 
they are hard to control and "program". It seems clear that in order to design and under- 
stand decentralized systems and to develop them into useful technologies, engineers must 
extend traditional notions of computation to encompass these architectures. This has been 
done to some extent in research on parallel and distributed computing (e.g., [10]) and with 
architectures such as systolic arrays [47]. However, as computing systems become more par- 
allelized and decentralized and employ increasingly simple individual processors, it becomes 
harder and harder to design and program such systems. 

Cellular automata (CAs) are a simple class of systems that captures some of the features 
of systems in nature listed above: large numbers of homogeneous components (simple finite 
state machines) extended in space, no central control, and limited communication among 
components. Given that there is no programming paradigm for implementing parallel com- 
putations in CAs, our research investigates how genetic algorithms (GAs) can evolve CAs 
to perform computations requiring coordination among many cells. In other words, the 
GA's job is to design ways in which the actions of simple components with local information 
and communication give rise to coordinated global information processing. In addition, we 
have adapted a framework — "computational mechanics" — that can be used to discover how 
information processing is embedded in dynamical systems [12] and thus to analyze how com- 
putation emerges in evolved CAs. Our ultimate motivations are two-fold: (i) to understand 
collective computation and its evolution in natural systems and (ii) to explore ways of auto- 
matically engineering sophisticated collective computation in decentralized multi-processor 
systems. 

In previous work we described some of the mechanisms by which genetic algorithms 
evolve cellular automata to perform computations, and some of the impediments faced by the 
GA [56]. We also briefly sketched our adaptation of the computational mechanics approach 
to understanding computation in the evolved CAs [17, 19, 20]. In this paper we give a more 
fully developed account of our research to date on these topics, report on new results, and 
compare our work with other work on GAs, CAs, and distributed computing. 

This paper is organized as follows. In Sec. 2-4, we review cellular automata, define 
a computational task for CAs — "density classification" — that requires global coordination, 
and describe how we used a GA to evolve cellular automata to perform this task. In Sec. 5-8, 
we describe the results of the GA evolution of CAs. We first describe the different types of 
CA computational strategies discovered by the GA for performing the density classification 
task. We then make the notion of computational "strategies" more rigorous by defining 
them in terms of embedded particles, particle interactions, and geometric "subroutines" 
consisting of these components. This high-level description enables us to explain how the 
space-time configurations generated by the evolved CAs give rise to collective computation 
and to predict quantitatively the CAs's computational performance. We then use embedded- 

and serial processing have been termed "von Neumann style", von Neumann was also one of the inventors 
of "non von Neumann-style" architectures such as cellular automata. 
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Figure 1: The components of one-dimensional, binary-state, r = 1 
("elementary") CA 110 shown iterated one time step on a configura- 
tion with N = 11 lattice sites and periodic boundary conditions (i.e., 

SN = So)- 



particle descriptions to explain the evolutionary stages by which the successful CAs were 
produced by the GA. Finally, in Sec. 9 we compare our research with related work. 



2. Cellular Automata 

An one-dimensional cellular automaton consists of a lattice of N identical finite-state ma- 
chines (cells), each with an identical topology of local connections to other cells for input and 
output, along with boundary conditions. Let E denote the set of states in a cell's finite-state 
machine and let k — |E| denote the number of states per cell. Each cell is indexed by its 
site number i — 0,1, . . . , N — 1. A cell's state at time t is denoted by s\, where s\ G E. 
The state s\ of cell i together with the states of the cells to which it is connected is called 
the neighborhood rjj of cell i. Each cell obeys the same transition rule <f>(i]j), that gives the 
update state s- +1 = <f>(rjl) for cell % as a function of r}\. We will drop the indices on s\ and 77* 
when we refer to them as general (local) variables. 

We use s* to denote the configuration of cell states: 

s* - s* s* s* 
& — *o*i • • • *JV-1- 

A CA {E^, 0} thus specifies a global map $ of the configurations: 

$ . y> n -> E^, 

with 

s t+i = $ ( g t)_ 
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In some cases in the discussion below, $ will also be used to denote a map on subconfigura- 
tions of the lattice. Whether $ applies to global configurations or sub configurations should 
be clear from context. 

In a synchronous CA, a global clock provides an update signal for all cells: at each t all 
cells synchronously read the states of the cells in their neighborhood and then update their 
own states according to s\ = <f>(r)f). 

The neighborhood rj is often taken to be spatially symmetric. For one-dimensional 
CAs, rji = Sj_ r , . . . , s , . . . , s i+r , where r is the CA's radius. Thus, : E 2r+1 — > E. For 
small-radius, binary-state CAs, in which the number of possible neighborhoods is not too 
large, is often displayed as a look-up table, or rule table, that lists each possible 77 together 
with its resulting output bit s t+1 . 

The architecture of a one-dimensional, (k,r) = (2, 1) CA is illustrated in Fig. 1. Here, 
the neighborhood of each cell consists of itself and its two nearest neighbors and the boundary 
conditions are periodic: sn = so- 

The 256 one-dimensional, (k,r) = (2,1) CAs are called elementary CAs (ECAs). Wol- 
fram [81] introduced a numbering scheme for one-dimensional CAs. The output bits can 
be ordered lexicographically, as in Fig. 1, and are interpreted as the binary representation 
of an integer between and 255 with the leftmost bit being the least-significant digit and 
the rightmost the most-significant digit. In this scheme, the elementary CA pictured here is 
number 110. 

In this paper we will restrict our attention to synchronous, one-dimensional, (k, r) = 
(2, 3) CAs with periodic boundary conditions. This choice of parameters will be explained 
below. For ease of presentation, we will sometimes refer to a CA by its transition rule 
(e.g., as in "the CA ..."). 

The behavior of CAs is often illustrated using space-time diagrams in which the con- 
figurations s* on the lattice are plotted as a function of time. Fig. 2 shows a space-time 
diagram of the behavior of ECA 110 on a lattice of N = 149 sites and periodic boundary 
conditions, starting from an arbitrary initial configuration (the lattice is displayed horizon- 
tally) and iterated over 149 time steps with time increasing down the figure. A variety of 
local structures are apparent to the eye in the space-time diagram. They develop over time 
and move in space and interact. 

ECAs are among the simplest spatial dynamical systems: discrete in time, space, and 
local state. Despite this, as can be seen in Fig. 2, they generate quite complicated, even 
apparently aperiodic behavior. The architecture of a CA can be modified in many ways — 
increasing the number of spatial dimensions, the number k of states per cell, and the neigh- 
borhood size r; modifying the boundary conditions; making the local CA rule probabilistic 
rather than deterministic; making the global update $ asynchronous; and so on. 

CAs are included in the general class of "iterative networks" or "automata networks" . 
(See [30] for a review.) They are distinguished from other architectures in this class by their 
homogeneous and local (r <C N) connectivity among cells, homogeneous update rule across 
all cells, and (typically) relatively small k. 

For quite some time, due to their appealingly simple architecture, CAs have been suc- 
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Figure 2: A space-time diagram illustrating the typical behavior of 
elementary CA (ECA) 110. The lattice of 149 sites, displayed horizon- 
tally at the top, starts with so being an arbitrary initial configuration. 
Cells in state 1 are displayed as black and cells in state are displayed 
as white. Time increases down the page. 

cessfully employed as models of physical, chemical, biological, and social phenomena, such 
as fluid flow, galaxy formation, earthquakes, chemical pattern formation, biological morpho- 
genesis, and vehicular traffic dynamics. They have been considered as mathematical objects 
about which formal properties can be proved. They have been used as parallel computing 
devices, both for the high-speed simulation of scientific models and for computational tasks 
such as image processing. In addition, CAs have been used as abstract models for studying 
"emergent" cooperative or collective behavior in complex systems. For discussions of work 
in all these areas, see, e.g., [4, 26, 30, 37, 46, 59, 44, 55, 71, 82]. 

3. A Computational Task for Cellular Automata 

It has been shown that some CAs are capable of universal computation; see, e.g., [3, 50, 67]. 
The constructions either embed a universal Turing machine's tape states, read/write head 
location, and finite-state control in a CA's configurations and rule or they design a CA rule, 
supporting propagating and interacting particles, that simulates a universal logic circuit. 
These constructions are intended to be in-principle demonstrations of the potential compu- 
tational capability of CAs, rather than implementations of practical computing devices; they 
do not give much insight about the computational capabilities of CAs in practice. Also, in 
such constructions it is typically very difficult to design initial configurations that perform 
a desired computation. Moreover, these constructions amount to using a massively parallel 
architecture to simulate a serial one. 

Our interest in CA computation is quite different from this approach. In our work, CAs 
are considered to be massively parallel and spatially extended pattern-forming systems. Our 
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goal is to use machine-learning procedures, such as GA stochastic search, to automatically 
design CAs that implement parallel computation by taking advantage of the patterns formed 
via collective behavior of the cells. 

To this end, we chose a particular computation for a one-dimensional, binary-state CA— 
density classification — that requires collective behavior. The task is to determine whether 
p , the fraction of Is in the initial configuration (IC) s , is greater than or less than a critical 
value p c . If p > Pc the entire lattice should relax to a fixed point of all Is (i.e., $(1^) = 1^) 
in a maximum of T max time steps; otherwise it should relax to a fixed point of all Os ((i.e., 
$(0^) = 0^) within that time. The task is undefined for p = Pc- In our experiments we set 
p c = 1/2 and T max = 2N. The performance V I N (4>) of a CA on this task is calculated by 
randomly choosing / initial configurations on a lattice of N cells, iterating on each IC for 
a maximum of T max time steps, and determining the fraction of the / ICs that were correctly 
classified by — a fixed point of all Is for p > p c , and a fixed point of all 0s otherwise. No 
partial credit is given for final configurations that have not reached an all- Is or all-Os fixed 
point. As a shorthand, we will refer to this task as the "p c = 1/2" task. Defining the task for 
other values of p c is of course possible; e.g., Chau et al. showed that is is possible to perform 
the task for rational densities p c using two one-dimensional elementary CAs in succession 
[6]. 

This task is trivial for a von Neumann-style architecture that holds the IC as an array 
in memory: it simply requires counting the number of Is in s . It also trivial for a two-layer 
neural network presented with each s° on one of its iV input units, all of which feed into 
a single output unit: it simply requires weights set so that the output unit fires when the 
activation reaches the desired threshold p c . In contrast, it is nontrivial to design a CA of our 
type to perform this task: all cells must agree on a global characteristic of the input even 
though each cell communicates its state only to its neighbors. 

The p c = 1/2 task for CAs can be contrasted with the well-studied tasks known as 
"Byzantine agreement" and "consensus" in the distributed computing literature (e.g., [22, 
27]). These are tasks requiring a number of distributed processors to come to agreement 
on a particular value held initially by one of the processors. Many decentralized protocols 
have been developed for such tasks. They invariably assume that the individual processors 
have more sophisticated computational capabilities and memory than the individual cells in 
our binary-state CAs or that the communication topologies are more complicated than that 
of our CAs. Moreover, to our knowledge, none of these protocols addresses the problem of 
classifying a global property (such as initial density) of all the processors. 

Given this background, we asked whether a GA could design CAs in which collective 
behavior allowed them to perform well above chance (> P^r(0) = 0.5) on this task for a 
range of N. To minimize local processor and local communication complexity, we wanted to 
use the smallest values of k and r for which such behavior could be obtained. Over all 256 
ECAs 0, the maximum performance V$ (4>) is approximately 0.5 for N e {149,599,999}. 
For all CAs evolved in 300 runs of the GA on (k, r) = (2,2) CAs, the maximum V$ (0) 
was approximately 0.58 for N = 149 and approximately 0.5 for N e {599,999}. (The GA's 
details will be given in the next section.) Increasing the radius to r = 3, though, resulted in 
markedly higher performance and more sophisticated collective behavior. As a result, all of 
the experiments described in this paper were performed on one- dimensional (k, r) = (2,3) 
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Figure 3: Space-time diagrams for 4> ma j, the r = 3 local-majority- vote 
CA. In the left diagram, po < 1/2; in the right diagram, po > 1/2. 

CAs with N e {149,599,999} and periodic boundary conditions. Note that for r = 3, the 
neighborhood size \r)\ = 7. 

One naive candidate solution to the p c = 1/2 task, which we will contrast with the 
GA-evolved CAs, is the r = 3 "majority vote" CA. This CA, denoted <p ma \, maps the center 
cell in each 7-cell neighborhood to the majority state in that neighborhood. Fig. 3 gives two 
space-time diagrams illustrating the behavior of ma j on two ICs, one with p < 1/2 and the 
other with p > 1/2- As can be seen, small high-density (low-density) regions are mapped 
to regions of all Is (Os). But when an all- Is region and an all-Os region border each other, 
there is no way to decide between them and both persist. Thus, </> ma j does not perform the 

p c = 1/2 task. In particular, 'P^ )4 (0 ma j) was measured to be zero for N e {149, 599, 999}. At 
a minimum more sophisticated coordination in the form of information transfer and decision 
making is required. And, given the local nature of control and communication in CAs, the 
coordination among cells must emerge in the absence of any central processor or central 
memory directing the cells. 

Other researchers, building on our work, have examined variations of the p c — 1/2 task 
that can be performed by simple CAs or by combinations of CAs. Capcarrere et al. [5] noted 
that changing the output specification makes the task significantly easier. For example, 
EC A 184 classifies densities of initial conditions within \N/2\ time steps by producing a 
final configuration of a checkerboard pattern (01)* interrupted by one or more blocks of at 
least two consecutive 0s for low-density ICs or at least two consecutive Is for high-density ICs. 
Fuks [32] noted that by using the final configuration of ECA 184 as the initial configuration 
of ECA 232, the correct final configuration of either all-Os or all-Is is obtained. Note that 
Fuks' solution requires a central controller that counts time up to \N/2\ steps in order to 
shift from a CA using rule 184 to one using rule 232. 

Both solutions always yield correct density classification, whereas the single-CA p c — 1/2 
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task is considerably more difficult. In fact, it has been proven that no single, finite-radius 
two-state CA can perform the p c = 1/2 task perfectly for all N [18, 48]. 

Our interest is not focused on developing a new and better parallel method for per- 
forming this specific task. Clearly, one-dimensional, binary-state cellular automata are far 
from the best architectures to use if one is interested in performing density classification 
efficiently. As we have emphasized before, the task is trivial within other computational 
model classes. Instead, our interest is in investigating how GAs can design CAs that have 
interesting collective computational capabilities and how we can understand those capabil- 
ities. Due to our more general interest, we have been able to adapt this paradigm to other 
spatial computation tasks — tasks for which the above specific solutions do not apply and for 
which even approximate hand-designed CA solutions were not previously known [19]. 

4. Evolving Cellular Automata with Genetic Algorithms 

Genetic algorithms are search methods inspired by biological evolution [40]. In a typical 
GA, candidate solutions to a given problem are encoded as bit strings. A population of such 
strings ("chromosomes") is chosen at random and evolves over several generations under 
selection, crossover, and mutation. At each generation, the fitness of each chromosome 
is calculated according to some externally imposed fitness function and the highest-fitness 
chromosomes are selected preferentially to form a new population via reproduction. Pairs of 
such chromosomes produce offspring via crossover, where an offspring receives components 
of its chromosome from each parent. The offspring chromosomes are then subject at each bit 
position to a small probability of mutation (i.e., being flipped). After several generations, 
the population often contains high-fitness chromosomes — approximate solutions to the given 
problem. (For overviews of GAs, see [35, 54].) 

We used a GA to search for (k,r) = (2,3) CAs to perform the p c = 1/2 task. 2 Each 
chromosome in the population represented a candidate CA — it consisted of the output bits 
of the rule table, listed in lexicographic order of neighborhood (cf. <fi in Fig. 1). The 
chromosomes representing CAs were of length 2 2r+1 = 128 bits. The size of the space in which 
the GA searched was thus 2 128 — far too large for exhaustive enumeration and performance 
evaluation. 

Our version of the GA worked as follows. 

First, an initial population of M chromosomes was chosen at random. The fitness F^(<f)) 
of a CA in the population was computed by randomly choosing / ICs on a lattice of N 
cells, iterating the CA on each IC either until it arrived at a fixed point or for a maximum of 
T max time steps. It was then determined whether the final configuration was correct — i.e., 
the all-Os fixed point for p < 1/2 or the all-Is fixed point for p > 1/2. F^(<p) was the 
fraction of the / ICs on which <fi produced the correct final behavior. No credit was given 
for partially correct final configurations. 

In each generation, (1) a new set of / ICs was generated; (2) F^((j>) was computed for 
each CA in the population; (3) CAs in the population were ranked in order of fitness (with 

2 The basic framework was introduced in Ref. [61] to study issues of phase transitions, computation, and 
adaptation. For a review of the original motivations and a critique of the results see Ref. [57] . 
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ties broken at random); (4) a number E of the highest fitness CAs (the "elite") was copied 
to the next generation without modification; and (5) the remaining M — E CAs for the next 
generation were formed by crossovers between randomly chosen pairs of the elite CAs. With 
probability p c , each pair was crossed over at a single randomly chosen locus /, forming two 
offspring. The first child inherited bits through I from the first parent and bits / + 1 through 
127 from the second parent; vice versa for the second child. The parent CAs were chosen for 
crossover from the elite with replacement — that is, an elite CA was permitted to be chosen 
any number of times. The two offspring chromosomes from each crossover (or copies of the 
parents, if crossover did not take place) were mutated (0 — >• 1 and 1 — > 0) at each locus with 
probability p m . This process was repeated for G generations during a single GA run. Note 
that since a different sample of ICs was chosen at each generation, the fitness function itself 
is a random variable. 

We ran experiments with two different distributions for choosing the M chromosomes in 
the initial population and the set of / ICs at each generation: (i) an "unbiased" distribution 
in which each bit's value is chosen independently with equal probability for and 1, and 
(ii) a density-uniform distribution in which strings were chosen with uniform probability 
over A G [0, 1] or over p £ [0, 1], where A is the fraction of Is in 0's output bits and p 
is the fraction of Is in the IC. Using the density-uniform distribution for the initial CA 
population and for the ICs considerably improved the GA's ability to find high fitness CAs 
on any given run. (That is, we could use 50% fewer generations per GA run and still find 
high performance CAs.) The results we report here are from experiments in which density- 
uniform distributions were used. 

The experimental parameters we used were M = 100, / = 100, E = 20, iV = 149, 
^max = 2iV, p c = 1.0 (i.e., crossover was always performed), p m = 0.016, and G = 100. Ex- 
periments using variations on these parameters did not result in higher performance solutions 
or faster convergence to the best-performance solutions. 

To test the quality of the evolved CAs we used V l f with N 6 {149,599,999}. This 
performance measure is a more stringent quality test than the fitness Fff° used in the GA 
runs: under the ICs are chosen from an unbiased distribution and thus have po close to 
the density threshold p = 1/2. Such ICs are the hardest cases to classify. Thus, V$ gives 
a lower bound on other performance measures. In machine learning terms, the ICs used to 
calculate F^g are the training sets for the CAs and the ICs used to calculate P^° 4 are larger 
and harder test sets that probe the evolved CA's generalization ability. 

5. Results of Experiments 

In this section we describe the results from 300 independent runs of this GA, with different 
random number seeds. 

In each of the 300 runs, the population converged to CAs implementing one of three 
types of computational strategies. The term "strategy" here refers to the mechanisms by 
which the CA attains some level of fitness on the p c — 1/2 task. These three strategy types, 
"default", "block expanding", and "particle", are illustrated in Figures 4-6. In each figure, 
each row contains two space-time diagrams displaying the typical behavior of a CA <fi that 
was evolved in a GA run. Thus, CAs from six different runs are shown. In each row, p < 1/2 
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in the left space-time diagram and p$> 1/2 in the right. The rule tables and measured Vff 
values for the six CAs are given in Table 1. 

5.1 Default Strategies 

In 11 out of the 300 runs, the highest performance CAs implemented "default" strategies, 
which on almost all ICs iterate to all 0s or all Is, respectively. The typical behavior of two 
such CAs, j £ and 0j e f, is illustrated in Figures 4(a) and 4(b). Default strategies each have 
Vlf((f)) ~ 0.5, since each classifies one density range (e.g., p < 1/2) correctly and the other 
(p > 1/2) incorrectly. Since the initial CA population is generated with uniform distribution 
over A it always contains some CAs with very high or low A. And since A is the fraction 
of Is in the output bits of the look-up table, these extreme-A CAs tend to have one or the 
other default behavior. 

5.2 Block-Expanding Strategies 

In most runs (280 out of 300 in our experiments) the GA evolved CAs with strategies like 
those shown in Figures 5(a) and 5(b). 0g xp in Fig. 5(a) defaults to an all- Is fixed point (right 
diagram) unless there is a sufficiently large block of adjacent (or almost adjacent) 0s in the 
IC. In this case it expands that block until 0s fill up the entire lattice (left diagram). 0g xp 
in Fig. 5(b) has the opposite strategy. It defaults to the all-Os fixed point unless there is a 
sufficiently large block of Is in the IC. The meaning of "sufficiently large block" depends on 
the particular CA, but is typically close to the neighborhood size 2r + 1. For example, 0g xp 

will expand blocks of 8 or more 0s and 0g xp will expand blocks of 7 or more Is. 

These "block-expanding" strategies rely on the presence or absence of blocks of Is or 
0s in the IC: blocks of adjacent 0s (Is) are more likely to appear in low- (high-) density 
ICs. Since the occurrence of such blocks is statistically correlated with p , recognizing and 
then expanding them leads to fitnesses above those for the default strategy. The strength 
of this correlation depends on the initial density p and on the lattice size N. Typical 
block-expanding strategies have F^g ~ 0.9 and ~ 0.6. The block-expanding strategies 
designed by the GA are adapted to N = 149; their performances do not scale well to larger 
lattice sizes. This occurs since the probability of a block of, say, seven adjacent Is appearing 
for a given p Q increases with N and this means that the correlation between the occurrence 
of this block and density decreases. This can be seen in the measured values of Vj? for 0| xp 

and 0g xp for longer lattices given in Table 1. 

5.3 Embedded-Particle Strategies 

The block-expanding strategies are not examples of the kind of sophisticated coordination 
and information transfer that we claimed must be achieved for robust performance on the 
p c = 1/2 task. Under these strategies all the computation is done locally in identifying 
and then expanding a "sufficiently large" block. Moreover, the performance on N = 149 
does not generalize to larger lattices. Clearly, the block-expanding strategies are missing 
important aspects required by the task. The third class of strategies evolved by the GA, the 
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Figure 4: Space-time behavior of "default" strategy CAs evolved on 
two different GA runs, (a) </>| f with p = 0.48 (left) and p = 0.74 
(right). On almost all ICs this CA iterates to a fixed point of all 0s, 
correctly classifying only low-/? ICs. (b) £ with po = 0.15 (left) and 
po = 0.56 (right). On almost all ICs this CA iterates to a fixed point 
of all Is, correctly classifying only high-p ICs. 
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Figure 5: Space-time behavior of two "block expanding" CAs evolved 
on different GA runs, (a) </>| X p with pq = 0.46 (left) and po = 0.55 
(right). This CA defaults to a fixed point of all Is unless the IC 
contains a sufficiently large block of adjacent 0s, in which case, that 
block is expanded, (b) </>g xp with po = 0.44 (left) and po = 0.52 
(right). This CA defaults to a fixed point of all 0s unless the IC 
contains a sufficiently large block of adjacent Is, in which case, that 
block is expanded. The classification of the IC is correct in each of 
these four cases. 
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"embedded-particle" strategies, do achieve the coordination and communication we alluded 
to earlier. Typical space-time behaviors of two particle strategies, 0p ar and 0p ar , are given in 
Figures 6(a) and 6(b). It can be seen that there is a transient phase during which the spatial 
and temporal transfer of information about the local density takes place. Such strategies 
were evolved in 9 out of the 300 runs. 

0p ar 's behavior is somewhat similar to that of ma j in that local high-density regions 
are mapped to all Is and local low-density regions are mapped to all 0s. In addition, a 
vertical stationary boundary separates these regions. The set of local spatial configurations 
that make up this boundary is specified in formal language terms by the regular expression 
(1) + 01(0) + , where (w) + means a positive number of repetitions of the word w [41]. 

The stationary boundary appears when a region of Is on the left meets a region of 0s on 
the right. However, there is a crucial difference from ma j: when a region of 0s on the left 
meets a region of Is on the right, a checkerboard region (01) + grows in size with equal speed 
in both directions. A closer analysis of its role in the overall space-time behavior shows that 
the checkerboard region serves to decide which of the two adjacent regions (0s and Is) is the 
larger. It does this by simply cutting off the smaller region and so the larger (0 or 1) region 
continues to expand. The net decision is that the density in the region was in fact below or 
above p c = 1/2. The spatial computation here is largely geometric: there is a competition 
between the sizes of high- and low-density regions. 

For example, consider the right-hand space-time diagram of Fig. 6(a). The large low- 
density region between the lines marked "boundary 1" and "boundary 2" is smaller than 
the large high-density region between "boundary 2" and "boundary 1" (moving left from 
boundary 2 and wrapping around). The left-hand side of the checkerboard region (centered 
around boundary 2) collides with boundary 1 before the right-hand side does. The result is 
that the collision cuts off the inner white region, letting the outer black region propagate. 

In this way, 0p ar uses local interactions and simple geometry to determine the relative 
sizes of adjacent low- and high-density regions that are larger than the neighborhood size. 
As is evident in Figures 6(a) and 6(b), this type of size competition over time happens across 
increasingly larger spatial scales, gradually resolving competitions between larger and larger 
regions. 

The black-white boundary and the checkerboard region can be thought of as signals 
indicating "ambiguous" density regions. Each of these boundaries has local density exactly 
at p c = 1/2. Thus, they are not themselves "classified" by the CA as low or high density. 
The result is that these signals can persist over time. The creation and interaction of these 
signals can be interpreted as the locus of the computation being performed by the CA — they 
form its emergent "algorithm" , what we have been referring to as the CA's "strategy" . 

0p ar (Fig. 6(b)) follows a similar strategy, but with a vertically striped region playing 
the role of the checkerboard region in 0p ar . However, in this case there are asymmetries in 

the speeds of the propagating region boundaries. This difference yields a lower Vjj 4 , as can 
be seen in Table 1. 

These descriptions of the computational strategies evolved by the GA are informal. A 
major goal of our work is to make terms such as "computation", "computational strategy", 
and "emergent algorithm" more rigorous for cellular automata. In the next section we will 
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CA Name 


Rule Table (Hexadecimal) 


-pio 4 

' 149 


-pio 4 

' 599 


-pio 4 

' 999 


*daf 


100111215030114D 
01613507143B05BF 


0.500 


0.500 


0.500 


^dcf 


0BF9D97AF26F4F4B 
F3FF301F0B110DF7 


0.499 


0.499 


0.501 


0exp 


1010614604273F9B 
7FD7D9DF35F53FFF 


0.656 


0.523 


0.504 


, b 

rexp 


02330A4B07016711 
42D080C3CD877B7F 


0.643 


0.513 


0.502 


Vpar 


0504058605000F77 
037755877BFFB77F 


0.775 


0.740 


0.728 


Vpar 


00240066A0A02246 
76EFEFFFFBFFAAFE 


0.766 


0.687 


0.641 



Table 1: CA chromosomes (look-up table output bits) given in hex- 
adecimal and V$ for the six CAs illustrated in Figures 4-6, on lat- 
tices of sizes N = 149, N = 599, and N = 999. To recover the 128-bit 
string giving the CA look-up table outputs, expand each hexadecimal 
digit (left to right, top row followed by bottom row) to binary. This 
yields the neighborhood outputs in lexicographic order of neighbor- 
hood, with the leftmost bit of the 128-bit string giving the output bit 
for neighborhood 00000000, and so on. Since V$ is measured on a 
randomly chosen sample of ICs, it is a random variable. This table 
gives its mean over 100 trials for each CA. Its standard deviation over 
the same 100 trials is approximately 0.005 for each CA for all three 
values of N. For comparison, the best known (k, r) = (2, 3) CAs for 
the p c = 1/2 task have V\% ~ 0.85 (see Sec. 9). This appears to be 
close to the upper limit of V\% for this class of spatial architectures. 

describe how we are using the notions of domains, particles, and particle interactions to 
do this. We will use these notions to answer questions such as, How, precisely, is a given 
CA performing the task? What structural components are used to support this information 
processing? How can we predict and other computational properties of a given CA? Why 
is V X N greater for one CA than for another? What types of mistakes does a given CA make 
in performing the p c = 1/2 task? These types of questions are difficult, if not impossible, to 
answer in terms of local space-time notions such as the bits in a CA's look-up table or even 
the raw space-time configurations produced by the CA. A higher-level description is needed, 
one that incorporates computational structures. 

6. Understanding Collective Computation in Cellular Automata 

In this section we will describe our approach to formalizing the notion of computational 
strategy in cellular automata and in other spatially extended systems. This approach is 
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Figure 6: Space-time behavior of two "particle" CAs evolved on dif- 
ferent GA runs, (a) </>p ar with po = 0.48 (left) and po = 0.51 (right), 
(b) <^ ar with po = 0.48 (left) and p = 0.51 (right). These CAs 
use the boundaries between homogeneous space-time regions to effect 
information transmission and processing. Again, the classification of 
the IC is correct in each of these four cases. 
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Site 199 Site 199 

Figure 7: (a) Space-time diagram illustrating the typical behavior 
of ECA 18 — a CA exhibiting apparently random behavior, i.e., the 
set of length- L spatial words has a positive entropy density as L — > 
oo. (b) The same diagram with the regular domains — instances of 
words in A — filtered out, leaving only the embedded particles P = 
{l(00) n l,n = 0,1,2,...}. (After Ref. [14].) 

based on the computational mechanics framework of Crutchfield [12], as applied to cellular 
automata by Crutchfield and Hanson [15, 38, 39]. This framework comprises a set of methods 
for classifying the different patterns that appear in CA space-time behavior, using concepts 
from computation and dynamical systems theories. These methods were developed as a way 
of analyzing the behavior of cellular automata and other dynamical systems. They extend 
more traditional geometric and statistical analyses by revealing the intrinsic information- 
processing structures embedded in dynamical processes. 

6.1 Computational Mechanics of Cellular Automata 

As applied to cellular automata, the purpose of computational mechanics is to discover an 
appropriate "pattern basis" with which to describe the structural components that emerge 
in a CA's space-time behavior. A CA pattern basis consists of a set A of formal languages 
{A 1 , i — 0, 1, . . .} in terms of which a CA's space-time behavior can be decomposed concisely 
and in a way constrained by the temporal dynamics. Once such a pattern basis is found, 
those cells in space-time regions that are described by the basis can be seen as forming 
background "domains" against which coherent structures — defects, walls, etc. — not fitting 
the basis move. In this way, structural features above and beyond the domains can be 
identified and their dynamics analyzed and interpreted on their own terms. 

For example, consider the space-time diagram of Fig. 7(a), illustrating the apparently 
random behavior of ECA 18. This example is a useful illustration of embedded information 
processing since the coherent structures are not immediately apparent to the eye. The 
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computational mechanics analysis [14, 39] of ECA 18 uses a pattern basis consisting of the 
single domain language A = {A = (0X) + }, where E = {0, 1}. That is, over most regions in 
ECA 18's configurations, every other site is a and the remaining sites are wildcards, either 
or 1. (Often this type of formal-language description of a set of configuration features can 
be discovered automatically via the "e-machine reconstruction" algorithm [12, 38].) 

Crutchfield and Hanson define a regular domain A J as a space-time region that (i) is 
a regular language and (ii) is space- and time-translation invariant. Regular domains can 
be represented by either the set A* of configurations or by the minimal finite-state machine 
M(A l ) that recognizes A*. More specifically, let {A*} be the pattern basis for CA <fi. Then 
the regular domain A* describes the space-time regions of {$*(s°) : t = 0, 1,2, . . .} whose 
configurations are words in A*. Formally, then, a regular domain A* is a set that is 

1. temporally invariant — the CA always maps a configuration in A* to another configu- 
ration in A 1 : $(s) = s', s, s' G A 1 ; and 

2. spatially homogeneous — the same pattern can occur at any site: the recurrent states 
in the minimal finite automaton M(A J ) recognizing A 1 are strongly connected. 

Once a CA's regular domains are discovered, either through visual inspection or by 
an automated induction method, and proved to satisfy the above two conditions, then the 
corresponding space-time regions are, in a sense, understood. Given this level of discovered 
regularity, the domains can be filtered out of the space-time diagram, leaving only the 
"unmodeled" deviations, referred to as domain "walls" , whose dynamics can then be studied 
in and of themselves. Sometimes, as is the case for the evolved CA we analyze here, these 
domain walls are spatially localized, time-invariant structures and so can be considered to 
be "particles". 

In ECA 18 there is only one regular domain A . It turns out that it is stable and so 
is called a regular "attractor" — the stable invariant set to which configurations tend over 
long times, after being perturbed away from it by, for example, flipping a site value [14, 39]. 
Although there are random sites in the domain, its basic pattern is described by a simple 
rule: all configurations are allowed in which every other site value is a 0. If these fixed-value 
sites are on even-numbered lattice sites, then the odd-numbered lattice sites have a wild card 
value, being or 1 with equal probability. The boundaries between these "phase- locked" 
regions are "defects" in the spatial periodicity of A and, since they are spatially localized 
in ECA 18, they can be thought of as particles embedded in the raw configurations. 

To locate a particle in a configuration generated by ECA 18, assuming one starts in a 
domain, one scans across the configuration, from left to right say, until the spatial period-2 
phase is broken. This occurs when a site value of 1 is seen where the domain pattern indicates 
a should be. Depending on a particle's structure, it can occur, as it does with ECA 18, 
that scanning the same configuration in the opposite direction (right to left) may lead to the 
detection of the broken domain pattern at a different site. In this case the particle is defined 
to be the set of local configurations between these locations. 

Due to this ECA 18's particles are manifest in spatial configurations as blocks in the set 
P = {l(00)™l,n = 0, 1,2, . . .}, a definition that is left-right scan invariant. Fig. 7(b) shows 
a filtered version of Fig. 7(a) in which the cells participating in A are colored white and the 
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Regular Domain 

A" = {(0(0 + 1))*} 
Particle 

a ~ A" A" = {l(00) ra l, n = 0, 1, 2, . . .} 
Interaction (annihilation) 

o + o^0 (A u ) 



Table 2: ECA 18's catalog of regular domains, particles, and particle 
interactions. The notation p ~ A l A J means that p is the particle 
forming the boundary between domains A* and A- 7 . 

cells participating in P are colored black. The spatial structure of the particles is reflected in 
the triangular structures, which are regions of the lattice in which the particle — the breaking 
of A°'s pattern — is localized, though not restricted to a single site. 

In this way, ECA 18 's configurations can be decomposed into natural, "intrinsic" struc- 
tures that ECA 18 itself generates; viz., its domain A and its particle P. These structures 
are summarized for a CA in what we call a particle catalog. The catalog is particularly simple 
for ECA 18; cf. Table 2. The net result is that ECA 18's behavior can be redescribed at the 
higher level of particles. It is noteworthy that, starting from arbitrary initial configurations, 
ECA 18's particles have been shown to follow a random walk in space-time on an infinite 
lattice, annihilating in pairs whenever they intersect [24, 39]. One consequence is that there 
is no further structure, such as coherent particle groupings, to understand in ECA 18's dy- 
namics. Thus, one moves from the deterministic dynamics at the level of the CA acting 
on raw configurations to a level of stochastic particle dynamics. The result is that ECA 18 
configurations, such as those in Fig. 7(a), can be analyzed in a much more structural way 
than by simply classifying ECA 18 as "chaotic". 

In the computational mechanics view of CA dynamics, embedded particles carry various 
kinds of information about local regions in the IC. Given this, particle interactions are the 
loci at which this information is combined and processed and at which decisions are made. 
In general, these structural aspects — domains, particles, and interactions — do not appear 
immediately. As will be seen below, often there is a initial disordered period, after which the 
configurations condense into well-defined regular domains, particles, and interactions. To 
capture this relaxation process we define the condensation time t c as the first iteration at 
which the filtered space-time diagram contains only well-defined domains in A and the walls 
between them. In other words, at t c , every cell participates in either a regular domain, of 
width at least 2r + 1, in a wall between them, or in an interaction between walls. (See Refs. 
[16] and [42] for a more detailed discussion of the condensation phase and its consequences.) 

6.2 Computational Mechanics of Evolved Cellular Automata 

This same methodology is particularly useful in understanding and formalizing the computa- 
tional strategies that emerged in the GA-evolved CA. Fortunately, in the following exposition 
most of the structural features in the evolved CA are apparent to the eye. Fig. 6(a) sug- 
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Regular Domains 


A u = {0+} 


A'={1+} 


A*= {(01)+} 


Particles (Velocities) 


a ~ A u A 1 (— ) 


P ~ A 1 A u (0) 
r? ~ A 1 A 2 (3) 


7 ~ A u A" (-1) 


5 ~ A 2 A (-3) 


/i ~ A 2 A 1 (1) 


Interactions 


decay 
react 
annihilate 


ct — > 7 + /x 
y5 + 7 — »• 77, // + /3 5, ri + 5^>P 
rj + pi^$ (A 1 ), 7 + 5-^0 (A ) 



Table 3: 

^par' s catalog of regular domains, particles (including veloc- 
ities in parentheses), and particle interactions. Note that this catalog 
leaves out possible three-particle interactions. 

gests that an appropriate pattern basis for 0p ar is A = {A = 00 + ,A 1 = 11 + ,A 2 = (01)+}, 
corresponding to the all-white, all-black, and checkerboard regions. Similarly, Fig. 6(b) sug- 
gests that for 0p ar we use A = {A = 00 + ,A 1 = 111 + ,A° = (011)+}, corresponding to the 
all-white, all-black, and striped regions. 

Note that a simple shortcut can be used to identify domains that are spatially and 
temporally periodic. If the same "pattern" appears repeated over a sufficiently large (^> r 
cells by r time steps) space-time region, then it is a domain. It is also particularly easy to 
prove such regions are regular domains. Exactly how the pattern is expressed as a regular 
language or as a minimal finite-state machine typically requires closer inspection. 

Once identified, the computational contributions of these space-time regions can be 
easily understood. The contributions consist solely of the generation of words in the cor- 
responding regular language. Since this requires only a finite amount of spatially localized 
memory, its direct contribution to the global computation required by the task is minimal. 
(The density of memory vanishes as the domain increases in size.) The conclusion is that the 
domains themselves, while necessary, are not the locus of the global information processing. 

Fig. 8 is a version of Fig. 6 with 0p ar 's and 0p ar 's regular domains filtered out. The 

result reveals the walls between them, which for 0p ar and 0p* ar are several kinds of embedded 
particles. The particles in Fig. 8 are labeled with Greek letters. This filtering is performed 
by a building a transducer that reads in the raw configurations and can recognize when 
sites are in which domain. The transducer used for Fig. 8(a), for example, outputs white 
at each site in one of 0p ar 's domains and black at each site participating in a domain wall. 
(The particular transducer and comments on its construction and properties can be found 
in Appendix A. The general construction procedure is given in Ref. [15].) 

Having performed the filtering, the focus of analysis shifts away from the raw config- 
urations to the new level of embedded-particle structure. The questions now become, Are 
the computational strategies explainable in terms of particles and their interactions ? Or, is 
there as yet some unrevealed information processing occurring that is responsible for high 
performance? 
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(b) 

Figure 8: (a) Version of Fig. 6(a) with the regular domains filtered 
out, revealing the particles and their interactions, (b) Filtered version 
of Fig. 6(b). 
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Regular Domains 


A u = {0+} 


A 1 = 


A^ = {(011)+} 


Particles (Velocities) 


a ~ A 1 A u (0) 
5 ~ A 2 A 1 (-3) 


P ~ A u A 1 (1) 
~ A A 2 (3) 


7 ~ A 1 A z (0) 
/i ~ A 2 A (3/2) 


Interactions 


decay 
react 
annihilate 


a — > 7 + /x 
/3 + 7 r/, n + P->5, rj + 5 ^ (3 
rj + pi^$ (A°), 7 + 5->0 (A 1 ) 



Table 4: </>par' s catalog of regular domains, particles (including their 
velocities in parentheses), and particle interactions. 

Tables 3 and 4 list all the different particles that are observed in the space-time behavior 
of 0p ar and 0p ar , along with their velocities and the interactions that can take place between 
them. Note that these particle catalogs do not include all possible structures, for example, 
possible three-particle interactions. The computational strategies of 0p ar and 0p ar can now 
be analyzed in terms of the particles and their interactions as listed in the particle catalogs. 



6.3 Computational Strategy of 0p ar 

In a high-performance CA such as 0p ar , particles carry information about the density of local 
regions in the IC, and their interactions combine and process this information, rendering a 
series of decisions about po- How do these presumed "functional" components lead to the 
observed fitness and computation performance? 

Referring to Table 3 and Fig. 8(a), 0p ar 's /3 particle is seen to consist of the zero- velocity 
black-to-white boundary. /3 carries the information that it came from a region in the IC in 
which the density is locally ambiguous: the density of l fc fc , when determined at its center, 
is exactly p c . The ambiguity cannot be resolved locally. It might be, however, at a later 
time, when more information can be integrated from other regions of the IC. 

Likewise, the a "particle" consists of the white-to-black boundary, but unlike the (3 
particle, a is unstable and immediately decays into two particles 7 (white-checkerboard 
boundary) and \x (checkerboard-black boundary). Like /3, a indicates local density ambiguity. 
The particles into which it decays, 7 and //, carry this information and so they too are 
"ambiguous density" signals. 7 carries the information that it borders a white (low density) 
region and \x carries the information that it borders a black (high density) region. The two 
particles also carry the mutual information of having come from the same ambiguous density 
region where the transient a was originally located. They carry this positional information 
about a's location by virtue of having the same speed (±1). 

To see how these elements work together over a space-time region consider the left side 
of the left-hand (po < 1/2) diagram in Fig. 8(a). Here, a decays into a 7 and a ji particle. 
The ji then collides with a /3 before its companion 7 (wrapping around the lattice) does. 



22 



This indicates that the low-density white region, whose right border is the 7, is larger than 
the black region bordered by the fx. The fx- (3 collision creates a new particle, a 8, that carries 
this information ("low-density domains") to the left, producing more low-density area. 5, 
a fast moving particle, catches up with the 7 ("low density") and annihilates it, producing 
A over the entire lattice. The result is that the white region takes over the lattice before 
the maximum number of iterations has passed. In this way, the classification of the (low 
density) IC has been correctly determined by the spatial algorithm — the steps we have just 
described. In the case of 0p ar , this final decision is implemented by 5's velocity being three 
times that of 7's. 

On the right side of the right-hand (p > 1/2) diagram in Fig. 8(a), a converse situation 
emerges: 7 collides with f3 before fx does. The effective decision indicates that the black 
region bordered by fx is larger than the white region bordered by 7. In symmetry with the 
Li- (3 interaction described above, the j-f3 interaction creates the 77 particle that catches up 
with the fx and the two annihilate. In this way, the larger black region takes over and the 
correct density classification is effected. 

A third type of particle-based information processing is illustrated at the top left of 
the right-hand diagram in Fig. 8(a). Here, an a decays into a 7 and a fx. In this case, the 
white region bordered by 7 is smaller than the black region bordered by fx. As before, 7 
collides with the (3 on its left, producing 77. However, there is another (3 particle to the right 
of ix. Instead of the fx proceeding on to eventually collide with the 77, the fx first collides 
with the second (3. Since the fx borders the larger of the two competing regions, its collision 
is slightly later than the 7-/3 collision to its left. The fx- (3 collision produces a 5 particle 
propagating to the left. Now the 77 and the S approach each other at equal and opposite 
speeds and collide. Since 77 is carrying the information that the white region should win and 
5 is carrying the information that the black region should win, their collision appropriately 
results in an "ambiguity" signal — here, a f3 that later on interacts with particles from greater 
distances. But since 77 traveled farther than S before their collision, a (3 is produced that is 
is shifted to the right from the original a. The net effect — the net geometric subroutine — is 
to shift the location of density ambiguity from that of the original a particle in the IC to a 
P moved to the right a distance proportional to the large black region's size relative to the 
white region's size. 

Even though this f3 encodes ambiguity that cannot be resolved by the information 
currently at hand — that is, the information carried by the 77 and 5 that produce it — this (3 
actually carries important information in its location, which is shifted to the right from the 
original a. To see this, refer to Fig. 9, an enlargement of the right diagram of Fig. 8(a) with 
some particle labels omitted for clarity. W and B denote the lengths of the indicated white 
(low density) and black (high density) regions in the IC. Given the particle velocities listed 
in Table 3 and using simple geometry it is easy to calculate that the (3, produced by the r]-5 
interaction, is shifted to the right by 2(B — W) cells from the a's original position. The shift 
to the right means that the high-density region (to the left of the leftmost (3) has gained 
B — W sites in size as a result of this series of interactions. In terms of relative position the 
local particle configuration (3af3 becomes fl'jfxP and then r]5, which annihilate to produce 
a final (3. This information is used later on, when the rightmost 7 collides with the new 
(3 before its partner fx does, eventually leading to black taking over the lattice, correctly 
classifying the (p > 1/2) IC. 
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Site 148 



Figure 9: An enlargement and relabeling of the right diagram of 
Fig. 8(a) with some particle labels omitted for clarity. W is the length 
of the leftmost white region, B is the length of the black region to 
its right, and d is the amount by which the [3 produced by the rj-S 
interaction has been shifted from the leftmost a. Given the particle 
velocities listed in Table 3 and using simply geometry, it is easy to 
calculate that d = 2(B - W). 
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Figure 10: (a) Type-1 misclassification by </>p ar , with p = 0.48. Even 
though po < p c , at t c (here, t = 8) the lengths of the black regions sum 
to 65 cells and the lengths of the white regions sum to 44 cells. This 
leads to a misclassification of IC density, (b) Type-2 misclassification 
by 0p ar , starting with p = 0.52. At t c (here also, t = 8) the sum of 
the lengths of the black regions is 65 cells and the sum of the lengths 
of the white regions is 61 cells. However, even though these condensed 
lengths correctly reflect the fact that po > p c , the black regions in the 
IC's center occur within white regions in such a way that they get 
cut off. Ultimately this yields a large white region that wins over the 
large black region and the IC is misclassified. 



It should now be clear in what sense we say that particles store and transmit informa- 
tion and that particle collisions are the loci of decision making. We described in detail only 
two such scenarios. As can be seen from the figures, this type of particle-based information 
processing occurs in a distributed, parallel fashion over a wide range of spatial and temporal 
scales. The functional organization of the information processing can be usefully analyzed 
at three levels: (i) the information stored in the particles and decisions made during their 
interaction, (ii) geometric subroutines that are coordinated groupings of particles and inter- 
actions that effect intermediate-scale functions, and (iii) the net spatial computation over 
the whole lattice and from t — to t — T max . 

In the next section we will argue that these levels of description of a CA's compu- 
tational behavior — in terms of information transmission and processing by particles and 
their interactions — is analogous to, but significantly extends, Marr's "representation and 
algorithm" level of information processing. It turns out to be the most useful level for un- 
derstanding and predicting the computational behavior of CAs, both for an individual CA 
operating on particular ICs and also for understanding how the GA evolved the progres- 
sive innovations in computational strategies over succeeding generations. (We will put these 
latter claims on a quantitative basis shortly.) 
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0p ar almost always iterates to either all Os or all Is within T max = 2N time steps. 
The errors it makes are almost always due to the wrong classification being reached rather 
than no classification being effected. 0p ar makes two types of misclassifications. In the 
first type, illustrated in Fig. 10(a), 0p ar reaches the condensation time t c having produced 
a configuration whose density is on the opposite side of p c than was po- The particles and 
interactions then lead, via a correct geometric computation, to an incorrect final configura- 
tion. In the second type of error, illustrated in Fig. 10(b), the density p tc is on the same 
side of the threshold as p , but the configuration is such that islands of black (or white) cells 
are isolated from other black (or white) regions and get cut off. This error in the geometric 
computation eventually leads to an incorrect final configuration. As N increases, this type 
of error becomes increasingly frequent and results in the decreasing V}^ values at larger N; 
see Table 1. 



6.4 Computational Strategy of 0p ar — Failure Analysis 

As noted in Table 4, the space-time behavior of 0p ar exhibits three regular domains: A 
(white), A 1 (black), and A 2 (striped). The size-competition strategy of 0p ar is similar to that 
of 0p ar . In 0p ar , the striped region plays the role of 0p ar 's checkerboard domain. However, 
when compared to 0p ar , the roles of the two domain boundaries A°A 1 and A 1 A° are now 
reversed. In 0p ar , A°A 1 is stable, while A 1 A° is unstable and decays into two particles. 

Thus, the strategy used by 0p ar is, roughly speaking, a 0-1 site-value exchange applied to 
0p ar 's strategy. Particles a, (3, 7, 5, 77, and p are all analogous in the two CAs, as are their 
interactions, if we exclude three-particle interactions; cf. Tables 3 and 4. They implement 
competition between adjacent large white and black regions. In analogy with the preceding 
analysis for 0p ar 's strategy, these local competitions are decided by which particle, a 7 or a 
p, reaches a (3 first. 

In 0p ar , 7 and p each approach (3 at the rate of one cell per time step. In 0p ar , although 
7 is now a stationary particle, it also effectively approaches (3 at the rate of 1 cell per time 
step, since (3 moves with velocity 1. p approaches (3 at the rate of 1/2 cell per time step, 
the velocity of p minus the velocity of (3. Thus, there is an asymmetry in 0p ar 's geometric 
computation that can result in errors of the type illustrated in Fig. 11(a). There the IC, 
with p = 0.52, condenses around iteration t c ~ 20 into a block of 85 black cells adjacent to 
a block of 64 white cells. The 7 particle, traveling at velocity 1 relative to (3, reaches (3 in 
approximately 85 time steps. The p particle, traveling at velocity 1/2 relative to (3, reaches 
(3 in approximately 103 time steps. Thus, even though the black cells initially outnumber 
the white cells, the black region is cut off first and white eventually wins out, yielding 
an incorrect classification at time step 165. In contrast, 0p ar , with its symmetric particle 
velocities, reaches a correct classification on this same IC (Fig. 11(b)). 

Like </>p ar , 0p ar makes two types of classification errors — the type in which 0p ar reaches 
t c with a configuration whose density p tc is on the opposite side of p c than p and the type 
illustrated in Fig. 11(a). The first type ("type 1") is an error in how the CA condenses 
into domains and particles. The second type ("type 2") is due to asymmetries in particle 
velocities. Consider Fig. 12, a blow-up of part of the right-hand diagram of Fig. 8(b). To the 
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Figure 12: Blow-up of part of right-hand diagram of Fig. 8b, illus- 
trating asymmetries in </>p ar 's particle velocities that can result in 
misclassifications . 
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left of the a (labeled) is an isolated black island and to the right is a white island. Together 
these two contiguous islands are bounded by two (3 particles on either side. Inside, as in 0p ar , 
an a decays into a 7 and a p. The resulting set of local particle interactions is such that 
the two islands compete for space within the two bounding /3's, ending with the creation of 
a new (3. If W and B respectively denote the lengths of the white and black islands, then 
after a series of interactions — P'jpP — > rjS — > (3 — the white region (to the left of the original 
leftmost pi) gains 2W — B sites in size. Thus, to increase this region's size the internal white 
island must be at least half the size of its adjacent black island. 

It is evident, therefore, that unlike 0p ar , there are asymmetries in 0p ar 's particle "logic", 
and these are biased in favor of classifying high densities. These asymmetries are what make 
^° 4 (0par) !o wer than V l /{<f)*)- (See Table 1.) 



7. Significance of the Particle-Level Description 

There are several alternative ways in which cellular automata such as 0p ar and </>p ar can be 
described as performing a computation. Marr anticipated some of these in delineating the 
various levels of information processing in vision [51]. In principle, our CAs are completely 
described by the 128 bits in their look-up tables. This is too low-level a description, however, 
to be useful for understanding how a given CA performs the p c — 1/2 task. Using this level 
is like trying to understand how a pocket calculator computes the square root function by 
examining the physical equations of motion for the electrons and holes in the calculator's 
silicon circuitry. 

Moreover, attempting interpretation at this level also violates, in a sense, one of the 
central tenets of the century-long study of dynamical systems, namely, that for nonlinear 
systems (e.g., most CAs), the local space-time equations of motion do not directly determine 
the system's long-term behavior. In the case of CAs it is not the individual look-up table 
neighborhood-output-bit entries acting over a single time step that directly give rise to the 
observed computational strategy. Instead, it is the interaction of subsets of CA look-up table 
entries that over a number of iterations leads to the emergence of domains, particles, and 
interactions. 

A second possibility for describing computational behavior in CAs is in terms of its 
detailed space-time behavior — i.e., the series of raw configurations of Os and Is. Again, this 
description is too low level for understanding how the solutions to the task are implemented. 
This approach is like trying to understand how a calculator's square root function is per- 
formed by taking a long series snapshots of the positions and velocities of the electrons and 
holes traveling through the integrated circuits. This prosaic view is analogous to Marr's 
"hardware implementation" level of description [51]. 

A third possibility is to describe the CA in terms related to the task's required in- 
put/output mapping and the task's computational complexity. For example, on a particular 
set of 10 4 random ICs, half with p < 1/2 and half with p > 1/2, 0p ar correctly classified 81% 
of the p < 1/2 ICs and 74% of the p > 1/2 ICs. On average 0p ar took 81 time steps to reach 
a fixed point; the maximum time was 227. The computational complexity of the p c = 1/2 
task on a serial architecture is O(N). This kind of operational analysis is roughly at Marr's 
"computational theory" level. 
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None of these levels of description gives much insight into how the task is being per- 
formed by a particular CA in terms of what information processing is being done and how 
it leads to a particular measured performance. What is needed is an intermediate-level de- 
scription whose primitives are informationally related to the task at hand. This is what 
the computational mechanics level of particles and particle interactions gives us. How ever 
one might detect the primitives at this level, it is analogous to Marr's "representation and 
algorithm" level, in which particles can be seen as representing aspects of the IC and their 
actions and interactions can be seen as the CA's emergent algorithm. 

Representations, in the form of data structures, and algorithms have been studied ex- 
tensively for von Neumann-style computers, but there have been few attempts to define such 
notions for decentralized spatially extended systems such as CAs. One can, of course, in 
principle implement any standard data structure and algorithm in a computation-universal 
CA, such as the game of Life CA [3], by simulating a von Neumann-style computer. How- 
ever, this is not a particularly useful notion of information processing if one's goals are to 
understand how nonlinear systems in nature compute. It is even more problematic if one 
wishes to design computation in complex decentralized spatially extended architectures. We 
believe that it will be essential to develop new "macroscopic-level" vocabularies in order to 
explain how collective information processing takes place in such architectures. (One benefit 
of this development would be an understanding of how to program these architectures in 
genuinely parallel ways.) 

A close reading shows that Marr's analysis of the descriptional levels required for visual 
processing misses several key issues. These are (i) the fact that representations emerge 
from the dynamics (i.e., are intrinsic to the dynamics), (ii) a clear formal definition is 
required to remove the subjectivity of detecting these intrinsic representations, and (iii) 
their functionality is entailed by a new level of dynamics, also intrinsic, that describes their 
interactions. As illustrated above in several cases, the computational mechanics framework 
that we are employing here makes these distinctions and provides the necessary concepts 
and methods to address these issues [12, 13]. The result is that we can analyze in detail the 
emergent computational strategies in the evolved CAs. 

Our particle-level description forms an explanatory vocabulary for emergent computa- 
tion in the context of one-dimensional, binary-state CAs. As was described above, particles 
represent various kinds of information about the IC and particle interactions are the loci of 
decision making that use this information. The resulting particle "logic" gives a functional 
description of how the computation takes place that is neither directly available from the 
CA look-up table nor from the raw space-time configurations produced by iterating the CA. 
It gives us a formal notion of "strategy" , allowing us to see, for example, how the strategies 
of 0p ar and 0p ar are similar and how they differ. One immediate consequence of this level 

of analysis is that we can say why 0p ar 's strategy is weaker. 

The level of particles and interactions is not only a qualitative description of spatial 
information processing, it also enables us to make quantitative predictions about computa- 
tional performance. In Refs. [16] and [42], we describe how to model a CA using its particle 
catalog and statistical properties at the condensation time. For each of several different 
CAs 0, we compare the model's prediction for Vj^((f>) as well as for the average time taken 
to reach a fixed point with the values measured for the actual 0. Some of these compar- 
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Figure 13: (a) Partial ancestral tree of (pp ar - (b) ^149 (</>) (diamonds) 

and ^149(0) (crosses) of the CAs 4>i in (a)- Six of the data points are 
marked with the name of the corresponding CA. 

isons will be summarized in Sec. 8.7. The degree to which a model's predictions agree with 
the corresponding CA's behavior indicates the degree to which the particle-level description 
captures how the CA is actually performing the computation. Since, as we will show, the 
model's predicted performance and the observed performance are very close, we conclude 
that the particle-level description accurately captures the intrinsic computational capability 
of the evolved CAs. 

8. Evolutionary History of 0p ar : Innovation, Contingency, and Exaptation 

The structural analysis of CA space-time information processing that we have just outlined 
also allows us to understand the evolutionary stages during which the GA produces CAs. 
Here we will show how the functional components — domains, particles, and interactions — 
arise and are inherited across the evolutionary history of a GA run. We will also demonstrate 
a number of evolutionary dynamical phenomena, such as the historical contingency of func- 
tional emergence and the appearance of initially nonfunctional behaviors that later are key 
to the final appearance of high performance CAs. 

To begin, Figures 13(a) and 13(b) illustrate 0p ar 's evolutionary history. Fig. 13(a) 
gives a partial tree of the parent-child relationships between some of 0p ar 's ancestors, each 
numbered by its generation of birth. Note that, since elite CAs can survive for more than 



30 



CA Name 


Rule Table (Hexadecimal) 




<pio* 

' 149 


<t>7b 


F6EFFFFFFFFFFFFF 
6B9F7F93FFFFBFFF 


0.50 


0.500 


08 


0400448102000FFF 
6B9F7F93FFFFBFFF 


0.61 


0.500 


013 


0400458100000FFF 
6B9F77937DFFFF7F 


0.82 


0.513 


017 


0500458100000FBF 
6B9F75937FBFFF5F 


0.92 


0.595 


018 


0500458100000FBF 
6B9F75937FBDF77F 


0.98 


0.691 


033 


0504058100840FB7 
4BBF55837FBDF77F 


0.97 


0.735 



Table 5: CA chromosomes (look-up table output bits) given in hex- 
adecimal, -F149, and V12q for the six ancestors of ^p ar described in 
this section. (See Fig. 1 for directions on how to recover the CA rule 
table outputs from the hexadecimal code.) The F^g values in this 
table are those calculated in the CA's generation of birth by the GA; 
the Vl2g values given are the means over 100 trials of the performance 
function, calculated after the run was complete. When tested over 100 
trials, the standard deviation of F^g is approximately 0.02 and the 
standard deviation of V\% is approximately 0.005 for each CA. 
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Figure 14: Space-time behavior of generation 7 and 8 ancestors, (f>jb 
and 4>s, of 0p ar . Both start from the same IC with po = 0.11. 



one generation, parents and offspring, e.g. 0io and 0i3, can have nonconsecutive generation 
labels. The CAs listed are those with the best fitness in the generation in which they arose. 
Table 5 lists the look-up tables, F^g, and V\% for the six ancestors of 0p ar described below. 

Fig. 13(b) plots F™ (diamonds) and V\% (crosses) versus generation of birth for each 
of these ancestors. In generations 0-7 the best CA in the population has F^g = 1^1% — 0.5, 
achieved by a "default" strategy like those of Fig. 4. Starting at generation 8, evolution 
proceeds in a series of abrupt increases in F™. More gradual increases are seen in Pf4 9 ; of 
course, this statistic is not available to and thus is not used by the GA. The occasional small 
decreases result from the stochastic nature of the fitness and performance evaluations. 

The goal now is to use the functional analysis to understand why these increases come 
about. To do so, we present a series of space-time diagrams, in Figs. 14-19, that compare 
space-time behaviors of CAs along the ancestral tree of Fig. 13(a). In each figure, space- 
time behavior with the same IC is given for two ancestrally related CAs to highlight the 
similarities and evolutionary innovations. 

8.1 76 and 8 (Fig. 14) 

Here the IC has very low density: p = 0.11. 07&, a "default" CA, always iterates to all Is, 
and in Fig. 14(a) misclassifies the IC. 7a (not shown) is a default CA that always iterates to 
all 0s. 7 b's look-up table contains mostly Is (see Table 5) and 07 fl 's look-up table contains 
mostly 0s. They crossed over at locus 52 to produce <f)$: therefore the first part of 0s 's 
look-up table contains mostly 0s, and the rest is mostly Is. In Fig. 14(b) 08 iterates to all 
0s and correctly classifies the IC. 
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8.2 8 and 13 (Fig. 15) 

Here po — 0.47. In Fig. 15(a) 8 quickly iterates to all Is. This is its more typical behavior 
than that shown in Fig. 14(b); very small regions of black quickly grow to take over the 
entire lattice. In this way, 8 is only slightly better than a default CA like cpn] it correctly 
classifies all high-density ICs and only a small number of very low density ICs. Note that 
while F/ 4 ° 9 °(0 8 ) > 0.5, V\%{^) remains at 0.5. 08 can be said to be carrying out a "default- 
with-exceptions" strategy. All runs that produced such strategies went on to converge on 
either block-expanding strategies or embedded-particle strategies. 

Interestingly, the checkerboard domain A 2 = {(01) + } is produced by 8 on some ICs 
(Fig. 15). However, A 2 does not contribute to 8 's fitness or performance. It is a functionally 
neutral feature. To determine this, we modified 8 's rule table to prevent the checkerboard 
domain from propagating. The two relevant entries are 0101010 — > and 1010101 — > 1. 
Flipping the output bit on either or both of these entries produces CAs with = 0.61 and 
^149 = 0-5; that is, fitness and performance identical to those of 8 . (The standard deviations 
of F-jfg for this and the other variant CAs discussed in this section were approximately 
0.02. The standard deviations of V{% were approximately 0.005.) Appropriating biological 
terminology, we can consider the checkerboard domain, at this generation, to be an adaptively 
neutral trait of 8 . 

013 represents a steep jump in fitness over 8 , as seen in Fig. 13(b). 0i3 is a block- 
expanding CA. It maps ICs to all Is unless there is a sufficiently large block of adjacent 
0s in the IC, in which case that block expands to eventually fill up the entire lattice, as in 
Fig. 15(b), which is a correct classification by t = T max . On some ICs, 0i 3 also produces 
a checkerboard domain and a similar but less ordered region; the latter can be seen in 
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Figure 16: Space-time behavior of 0p ar ancestors, 0i3 and 0i7 arising 
in generation 13 and 17. Both start from the same IC with po = 0.58. 



Fig. 15(b). We determined, in a fashion similar to that just explained above, that these 
traits also were adaptively neutral. 

8.3 0i3 and i7 (Fig. 16) 

Here po = 0.58. 0i3 expands blocks of 0s on many ICs with p > 1/2, including the one in 
this figure, resulting in misclassifications. In fact, many high-density ICs with p pa 0.5 are 
misclassified and, while 0i3 has markedly higher F^g than its ancestors, its performance 
7^49 is only marginally improved (see Table 5). 

013 creates three types of boundaries between white and black domains. Two of them 
are shown in Fig. 16(a), labeled a and a'. The a, like 0p ar 's a, exists for only a single time 
step and then decays into i] and p, whereas a' remains stable. A third type, a", does not 
appear for this IC but can be seen in Fig. 15(b). a' and a" support the block-expanding 
strategy, whereas a leads to a competition between white and black regions similar to that 
seen in 0p ar . 

In contrast, consider Fig. 16(b), where the same p = 0.58 IC is correctly classified by 
0i 7 . Recalling Table 5 we see that 0rr's F^g and V\% are both substantially higher than 
those of 0i 3 . 017 's higher F^ and V{% can be explained at the particle level. The particles 
are labeled in Fig. 16(a) and Fig. 16(b). 

017 creates the same set of particles as 0i3 (on some ICs it expands 0-blocks, not shown 
in Fig. 16(b)) but with different frequencies of occurrence: a 1 and a" appear less often than 
in 0i3 and a appears more often. Thus, 0i3 is more likely to expand 0-blocks, and thus make 
more errors, than i7 on ICs for which p > 0.5. Given 100 randomly generated p > 0.5 ICs, 
a' and a" were created by i3 in 86% of the ICs and by 0i7 in 12% of the ICs. Whenever 
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Figure 17: Space-time behavior of generation 17 and 18 ancestors, 4>n 
and 0i8, of 0p ar . Both are shown starting from the same IC that has 
po = 0.45. 

a 1 or a" are created, the final configuration will be all 0s regardless of whether a is created. 
That is, block expanding dominates other behaviors. This explains why nipping output bits 
to suppress the checkerboard domain does not significantly affect 0i3's F^g and Vl® 9 , but 
does significantly affect these values for 17 . When the checkerboard domain was suppressed 
in 17 , F^g decreased only to 0.86 but V^ g decreased to 0.54. 

Following Gould and Vrba [36], we consider the checkerboard domain A 2 to be an 
example of an "exaptation" — a trait that has no adaptive significance when it first appears, 
but is later co-opted by evolution to have adaptive value. According to Gould and Vrba, such 
traits are common in biological evolution. In the evolutionary innovation that goes from 0i 3 
to 0i7 the exaptation of A 2 in 0i3 makes just this transition to functionality associated with 
a marked increase in fitness and performance. This, in turn, leads to the change in dominant 
computational strategy away from block expanding. 



8.4 0i7 and i8 (Fig. 17) 

Here po = 0.45. The misclassification by 0i7 (illustrated in Fig. 17(a)) is compared with the 
correct classification by 0i7's higher fitness and performance child 0is (Fig. 17(b)). Both 
CAs create similar particles, but in 0i7 the velocity of the (3 particle is 1/3, whereas in 0i 8 
its velocity is zero. 

In Fig. 17(a), the white region (marked W) is larger than the black region to its right 
(marked B). Since the (3 particles have positive velocity, the black regions to W's left and 
right both expand to the right. Coming in from the left, this decreases the size of W. On 
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Figure 18: Space-time behavior of generation 18 and 33 ancestors, (pis 
and c/>33, of 4>p &T . Both start from the same IC with po = 0.61. 



the other side, the (rightmost) (5 particle moves away from the W region. This asymmetry 
allows the B region to win the size competition, when the B region should not. 

The asymmetry between black and white regions is corrected in 0i§ by the change 
in /3's velocity to zero. This makes the size competition between black and white regions 
symmetric. The result, seen in Fig. 17(b), is that the smaller B region is now cut off by the 
\i and P, the W region is allowed to grow, and the correct classification is made. 

8.5 0i 8 and 33 (Fig. 18) 

Here po = 0.61. <pi$, though an improvement over <pn, still carries with it remnants of its 
ancestors' block-expanding past. In Fig. 18(a), 0i 8 misclassifies the IC by creating an a" 
particle instead of an a particle at a white-black (ambiguous density) boundary in the IC. 
Recall that in 17 , a' or a" particles were created in 12% of the random ICs with p > 1/2. 
In 18 this frequency is about the same: 13%. Thus, 0is's main innovation over 17 is the 
zero velocity of the f3 particle and the resulting symmetric size-competition strategy. 

In 033, a descendant of 0ig, neither a' or a" particles are created. This explains the 
higher and V\% of 033 over those of 0is- 

8.6 033 and 64 (Fig. 19) 

Here, p = 0.45. 0p ar , here named 64 to denote its generation of birth, outperforms 33 
since it classifies more low-density ICs correctly. On some low-density ICs like the one used 
in Fig. 19(a), 033 condenses too much of the IC into black regions, a type 1 error. These 
then win the size competition, resulting in a misclassification. 064 makes type 1 errors on 
low-density ICs less often (e.g., as seen in Fig. 19(b)), it correctly classifies the same IC as in 
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Fi gure 19: Space-time behavior of generation 33 ancestor 033 of 0p ar 
and 064 (itself 0p ar ). Both start from the same IC, which has density 

/'o 0.45. 

Fig. 19(a). On the same set of 10 4 ICs, 62% of 033 's errors were on low-density ICs, whereas 
only 43% of 64 's errors were on low- density ICs. 

8.7 Particle Models of Evolved Cellular Automata 

The "natural history" of 0p ar ' s evolution given above demonstrates how we can understand 

the jumps in F^g and 'Pl^g in terms of regular domains and particles — functional compo- 
nents in the CA's dynamical behavior. The GA's actions can be described at a low level as 
manipulating bits in CA rule tables via crossover and mutation, but a better understand- 
ing of the evolutionary process emerges when we describe its actions at the higher level of 
manipulating particle types, velocities, and interactions. An important component of this 
viewpoint is that the particles and their interactions lead to higher fitness. To test the 
hypothesis more quantitatively, we ask to what extent the CAs' observed fitnesses and per- 
formances can be predicted from the particle and interaction properties alone. To this end, 
in collaboration with Wim Hordijk, we have constructed "ballistic" particle models of the 
CAs 08 through 064. These models are intended to isolate the particle-level mechanisms and 
in so doing allow us to determine how much of the CA behavior this level captures. 

A ballistic particle model M.^ of a CA consists of the catalog of particle types, veloc- 
ities, interactions, and their frequencies of occurrence at t c . M.^ is "run" by first using the 
particle frequencies to generate an initial configuration s tc of particles at the condensation 
time and then using the catalog of velocities and interactions to calculate the initial par- 
ticles' ballistic trajectories and the products of subsequent particle interactions. The final 
configuration is reached when either all particles have annihilated or when T max — t c steps 
have occurred. This configuration and the actual time at which it was reached gives us Af^'s 
prediction of what 0's classification would be for an IC corresponding to s tc and the time it 
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would take to reach it. Particle models and their analysis are described in detail in Refs. 
[16] and [42]. 



CA Name 


^(0) 




08 


0.500 


0.500 


013 


0.513 


0.524 


017 


0.595 


0.601 


018 


0.691 


0.747 


033 


0.735 


0.765 


064 


0.775 


0.775 



Table 6: The CA and model performances of (fig, (fiis, (fin, (fi\%, 
033, and (fi %A . (After Ref. [16].) Note 

0par has been referred here as 
064- The CA rule tables are given in Table 5. 

A comparison of the performances of the six CAs just analyzed and their particle models 
are given in Table 6. As can be seen, the agreement is within a few percent for most cases. In 
these and the other cases small discrepancies are due to simplifications made in the particle 
models. These include assumptions such as the particles being zero width and interactions 
occurring instantaneously. These error sources are analyzed in depth in Ref. [16]. For 18 
the error is higher, around 8%, due to a long-lived transient domain that is not part of the 
particle catalog used for the model. The main effect of this is that the condensation time 
is overestimated on some ICs that generate this domain. This, in turn, means that the 
model describes only the last stages of convergence to the answer configurations, which it 
gets correctly and so has a higher performance than 0i§. For 033 the error is around 4%. 
This appears to be due to errors in estimates of the distribution of particle types at the 
condensation time. 

The conclusion is that the particle-level descriptions can be used to quantitatively pre- 
dict the computational behavior of CAs and so also the CA fitnesses and performances in the 
evolutionary setting. In particular, the results support the claim that it is these higher-level 
structures, embedded in CA configurations, that implement the CA's computational strat- 
egy. More germane to the preceding natural history analysis, this level of description allows 
us to understand at a functional level of structural components the evolutionary process by 
which the CAs were produced. 

9. Related Work 

In Sec. 3 we discussed some similarities and differences between this work and other work 
on distributed parallel computation. In this section we examine relationships between this 
work and other work on computation in cellular automata. 

It should be pointed out that 0p ar 's behavior (and the behavior of many of the other 
highest-performance rules) is very similar to the behavior of the so-called Gacs-Kurdyumov- 
Levin (GKL) CA. This CA was invented not to perform the p c = 1/2 task, but to study 
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reliable computation and phase transitions in one-dimensional spatially-extended systems 
[33]. More extensive work by Gacs on reliable computation with CAs is reported in Ref. 
[34]. 

The present work and earlier work by our group came out of follow-on research to 
Packard's investigation of "computation at the edge of chaos" in cellular automata [60]. 
Originally Wolfram proposed a classification of CAs into four behavioral categories [81]. 
These categories followed the basic classification of dissipative dynamical systems: fixed 
point attractors exhibiting equilibrium behavior, limit cycle attractors exhibiting periodic 
behavior, chaotic attractors exhibiting apparently random behavior, and neutrally stable sys- 
tems at bifurcations exhibiting long transients. Wolfram suggested that the latter category 
was particularly appropriate for implementing sophisticated (even universal) computation. 

Following this with a more quantitative proposal Langton [49] hypothesized that a CA's 
A — the fraction of "non-quiescent states" (here, Is) in its look-up table's output states — 
was correlated "generically" with the CA's computational capabilities. In particular, he 
hypothesized that CAs with certain "critical" A values, which we denoted A c , would be more 
likely than CAs with A values away from A c to be able to perform complex computations, or 
even universal computation. Packard's goal was to test this hypothesis by using a genetic 
algorithm to evolve (k,r) = (2,3) CAs to perform the p c = 1/2 task, starting from an 
initial population chosen from a distribution that was uniform over A G [0, 1]. He found that 
after 100 generations, the final populations of CAs, when viewed only as distributions over 
A, tended to cluster close to A c values. He interpreted this clustering as evidence for the 
hypothesized connection between A c and computational ability. 

In Ref. [57] we were able to show, via theoretical arguments and empirical results, 
however, that the most successful CAs for the p c — 1/2 task must have A ~ 1/2. This value 
of A is quite different from Packard's quoted A c values. We argued that Packard's results were 
due to an artifact in his particular implementation of the GA. Using more standard versions 
and his version of GA search we obtained results that disagreed with Packard's findings and 
that were roughly in accord with our theoretical predictions that high performance CAs were 
to be found at A ~ 1/2, far from A c , and not in, for example, Wolfram's fourth CA category. 
We were also able to explain the deviations of our results from the theoretical predictions. 
The current work came out of the discovery of phenomena, such as embedded-particle CAs 
[17, 20], that were not found in Ref. [60]. Moreover, according to Langton the A = 1/2 
value for our high-performance CAs corresponds to CAs in Wolfram's chaotic class. The 
space-time diagrams shown earlier demonstrate that they are not "chaotic" ; their behavior, 
in fact, puts them in the first (fixed-point) category. 

Later, other researchers performed their own studies of evolving cellular automata for 
the p c = 1/2 task. Sipper and Ruppin [65, 66] used a version of the GA to evolve "nonuniform 
CAs" — CA-like architectures in which each cell uses its own look-up table to determine its 
state at each time step. For a lattice of size N, the individuals in the GA population are 
the N look-up tables making up a nonuniform CA. Sipper and Ruppin used this framework 
to evolve r = 1 nonuniform CAs to perform the p c = 1/2 task, as well as other tasks. They 
reported the discovery of nonuniform CAs with V{% values comparable to that of 0p ar . They 

did not report Vj^ 4 results for any other value of iV nor did they give statistics on how often 
high-performance nonuniform CAs were evolved. Moreover, no structural analysis of CA 
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space-time behavior or GA population dynamics was given. Thus, it is unclear how the high 
fitnesses were obtained, either dynamically or evolutionarily. 

Andre et al. used a genetic programming algorithm to evolve (k,r) = (2,3) CAs with 
N = 149 to perform the p c = 1/2 task [1]. This algorithm discovered particle CAs with 
higher V{% than that of <^ ar (e.g., 0.828 versus 0.776). We obtained the look-up table for 
one such CA, GP (D. Andre, personal communication) and found that on larger lattices, the 
performance of GP was close to that of <^ ar (P5 9 o 9 (0gp) = 0.765 and VlM GP ) = 0.723; cf. 
Table 1). It is not clear whether the improvement in V\% was due to the genetic programming 
representation CA look-up tables or some other factor related to increased computational 
resources. For example, their runs had a 500-fold larger population size M and 10-fold larger 
number of ICs over our GA runs. Their runs did, however, find high-performance CA in 
average numbers of generations that were half those in our GA. Thus, the computational 
resources they used in their evolutionary search were approximately 2500 times larger than 
in our GA runs. 

Paredis [62] and Juille and Pollack [45] experimented with coevolutionary learning tech- 
niques to improve the GA's search efficiency to find embedded particle CAs for the p c = 1/2 
task. The latter work specifically rewarded or penalized ICs of particular densities, depend- 
ing on the amount of information ICs of those densities provided for distinguishing fitnesses 
between CAs in the population. This resulted in a higher percentage of GA runs in which 
high-performance embedded-particle CAs were discovered and in the discovery of higher- 
performance CAs than in any of the non-coevolutionary runs. The highest performance 
CA discovered had V\§ = 0.863 ± 0.001, = 0.822 ± 0.001, and Vf^ = 0.804 ± 0.001. 
Unfortunately, the performance of this coevolved CA, although high on small lattices (e.g. 
iV = 149), decays more rapidly with lattice size than the GKL rule, which happens to have 
lower performance than the coevolved rule on small lattices. This is appears to be the result 
of the more complex domains that preclude, through additional persistent particles, conver- 
gence to the answer configurations, 0^ or l-^. Compared to the coevolved CAs, the GKL 
CA is one of the CAs that maintains high performance on larger lattices. 

Our own work has been extended to other tasks, most thoroughly to a global synchro- 
nization task for which we have performed similar analyses to those given in this paper 
[19]. 

Our notion of computation via particles and particle-interactions derives from that in- 
troduced by the computational mechanics framework [12, 38, 39] and so differs considerably 
from the notions used in most other work on designing CAs for computation. For example, 
propagating particle-like signals were used in the solution to the Firing Squad Synchroniza- 
tion Problem [52, 58, 77], in Smith's work on CAs for parallel formal-language recognition 
[68], and in Mazoyer's work on computation in one-dimensional CAs [53]. However, in all 
these cases, the particles and their interactions were designed by hand to be the explicit be- 
havior of the CA. That is, the particles are explicitly coded in each cell's local state and their 
dynamics and their interactions are coded directly into the CA lookup table. Typically, their 
interactions were effected by a relatively large number of states per site. Steiglitz, Kamal, 
and Watson's carry-ripple adder [70] and the universal computer constructed in the Game 
of Life [3] both used binary-state signals consisting of propagating periodic patterns. But, 
again, the particles were explicitly designed to ride on top of a quiescent background and 
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their interaction properties were carefully hand coded. In Squier and Steiglitz's "particle 
machine" [69] and in Jakubowski, Steiglitz, and Squier's "soliton machine" [43], particles 
are the primitive states of the CA cells. Moreover, their interaction properties are explicitly 
given by the CA rule table. These machines are essentially kinds of lattice gas automata 
[23] that operate on "particles" directly. (Other work on arithmetic in cellular automata 
has been done by Sheth, Nag, and Hellwarth [64] and Clementi, De Biase, and Massini [8], 
among others.) 

In contrast to these, particles in our system are embedded as walls between regular 
domains. They are often apparent only after those domains have been discovered and filtered 
out. Their structures and interaction properties are emergent properties of the patterns 
formed by the CAs. Notably, although each cell has only two possible states, the structures 
of embedded particles are spatially and temporally extended, and so are more complex than 
atomic or simple periodic structures. Typically, these structures can extend over spatial 
scales larger than the CA radius. For example, the background domain of the elementary 
CA (ECA 110) shown in Fig. 2 has a temporal periodicity of 7 time steps and a spatial 
periodicity of 14 sites, markedly larger than the r = 1 nearest-neighbor coupling. 

10. Conclusion 

Our philosophy is to view CAs as systems that naturally form patterns (such as regular 
domains) and to view the GA as taking advantage — via selection and genetic variation — of 
these pattern-forming propensities so as to shape them to perform desired computations. 
Within this framework, we attempt to understand the behavior of the resulting CAs by 
applying tools, such as the computational mechanics framework, formulated for analyzing 
pattern-forming systems. The result gives us a high-level description of the computationally 
relevant parts of the system's behavior. In doing so, we begin to answer Wolfram's last 
problem from "Twenty problems in the theory of cellular automata" (Wolfram, 1985): "What 
higher-level descriptions of information processing in cellular automata can be given?" We 
believe that this framework will be a basis for the "radically new approach" that Wolfram 
claimed will be required for understanding and designing sophisticated computation in CAs 
and other decentralized spatially extended systems. 

Our analysis showed that there are three levels of information processing occurring dur- 
ing iterations of the evolved high-performance CAs. The first was the type of information 
storage and transmission effected by the particles and the type of "logical" operations im- 
plemented by the particle interactions. The second, higher level comprised the geometric 
subroutines that implemented intermediate-scale computations. We analyzed in detail two 
of these that were important to the size competition between regions of low and high density. 
We also showed how variations in the particles led to several types of error at this level. The 
third and final level is that of the global computation over the entire lattice up to the answer 
time. This is the level at which fitness is conferred on the CAs. 

We analyzed in some detail the natural history that led to the emergence of such com- 
putationally sophisticated CAs. The evolutionary epochs typically proceed in a set sequence, 
with earlier epochs setting the (necessary) context for the later, higher performance ones. 
Often the jumps to higher epochs were facilitated by exaptations — changes in adaptively 
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neutral traits appearing in much earlier generations. 

There are a number of fruitful directions for future work. The first is to extend the 
lessons learned here to more general evolutionary search algorithms and pattern forming dy- 
namical systems. The problem of choosing a genetic representation of dynamical systems that 
helps, or at least does not hinder, the search will play an important role in addressing this. 
The evolution of CAs that operate on two-dimensional images rather than one-dimensional 
strings will also help address this issue and also open up application areas, such as iterative 
nonlinear image processing [11]. 

We also need to develop substantially better analytical descriptions of the search's pop- 
ulation dynamics and of how the intrinsic structures in CAs interact with that dynamics. 
Although the evolution of CAs is a very simplified problem from the biological perspective, 
the evolutionary time scale of the population dynamics and the development time scale of 
the CAs result in a two-time-scale stochastic dynamical system that is difficult to analyti- 
cally predict. Such predictions, say of how to set the mutation rate or population size for 
effective search, are centrally important both for basic understanding of evolutionary mech- 
anisms and for practical applications. Progress on quantitatively predicting the population 
dynamics occurring during epochal evolution has been made [73, 74]. The adaptation of 
the "statistical dynamics" approach introduced there to the evolution of CAs will be an im- 
portant, but difficult, step toward understanding complicated genotype-to-phenotype maps. 
The latter is highly relevant for using such search methods on complex problems. 

Another quantitative direction is the estimation of computational performance of dis- 
tributed systems based on higher-level descriptions. The results, reported here and described 
in detail in Refs. [16] and [42], on predicting CA computational performance are encour- 
aging. Constructing a more accurate model along with a quantitative analytical model of 
higher-level computation in CAs will help us understand how much the embedded CA struc- 
tures contribute individually to overall fitness. And this, in turn, will allow us to monitor the 
evolutionary mechanisms that lead to the emergence of collective computation in coordinated 
groups of functional units. 
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A. Domain Filter 

In this appendix we describe the properties and construction of 0p ar 's domain-recognizing 
and filtering transducer. 

The transducer, shown in Fig. 20, reads in binary CA configurations and outputs strings 
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of the same length, the lattice size N, in the domain-wall alphabet {A, 0, 1, 2, w}. In this 
alphabet A indicates that the transducer has not yet "synchronized" (see below) to the 
domain or wall structures in the configuration, {0,1,2} label each of the three domains, 
respectively, and w indicates a wall between domains. In the filtered space-time diagrams w 
is mapped to black and all other output symbols map to white. 

Briefly, 0p ar 's domain-wall transducer is constructed as follows. 0p ar has three domains, 
each of which can be described by simple finite-state machines. These machines form the 
recurrent states of the transducer. When the transducer first begins to read in the configu- 
ration, it may take several steps to disambiguate the site values and identify the appropriate 
domain in which they are participating. Working through the transitions and transient states 
that lead to the recurrent (domain) states determines the transitions from the start state. 
When the transducer is reading site values consistent with one of these domains, but then 
encounters site values that are not consistent with it (e.g. values indicating walls), then some 
number of additional site values must be read in to determine the domain type into which 
the transducer has moved. Such transitions determine the transducer's domain-to-domain 
transitions. 

Note, that due to the steps required to initially read in a sufficient number of site values 
to recognize the domains and walls, a process that we call synchronization, the transducer 
may have to read some portion of the configuration that it has already read, as it wraps 
around due to the lattice's periodic boundary conditions. This takes at most one additional 
pass over the configuration. 

The general construction procedure for domain- wall transducers is given in Ref. [15]. 
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